%%{
init: {
'theme': 'base',
'themeVariables': {
'background': '#000'
'primaryColor': '#BB2528',
'primaryTextColor': '#fff',
'primaryBorderColor': '#7C0000',
'lineColor': '#F8B229',
'secondaryColor': '#006100',
'tertiaryColor': '#fff'
}
}
}%%
%%| label: fig-full-model
%%| fig-width: 7
%%| fig-cap: |
%%| The complete model for data lineage that we can expand into when ready. We will use a simplified version to start with
%%| We will use a simplified version to start with and improve as the qualtity of the data we have access to increases.
%%|
graph LR
subgraph "Full Model"
A[Application]:::entity
BA[BusinessAttribute]:::entity
BKA[BusinessKeyActivity]:::entity
BP[BusinessProcess]:::entity
BT[BusinessTerm]:::entity
CO[ComplianceOfficer]:::entity
CP[CompliancePolicy]:::entity
DC[DataClassification]:::entity
DE[DataElement]:::entity
DQI[DataQualityIssue]:::entity
DQM[DataQualityMetric]:::entity
DQR[DataQualityRule]:::entity
DS[DataSet]:::entity
LOB[LineOfBusiness]:::entity
PS[ProcessSteward]:::entity
R[Regulation]:::entity
SR[SystemOfRecord]:::entity
TLC[TradeLifecycle]:::entity
A --IMPACTS--> A
DS --CONSUMED_BY--> A
DS --CREATED_BY--> A
A --UTILIZES--> BP
BA --MAPS_TO--> DE
BA --PART_OF--> BT
BKA --PART_OF--> BP
BP --CONSUMED_BY--> DS
BP --CREATED_BY--> DS
BP --GOVERNED_BY--> CP
BP --GOVERNED_BY--> R
BP --HAS_STAGE--> TLC
CO --RESPONSIBLE_FOR--> CP
CO --RESPONSIBLE_FOR--> DS
CO --RESPONSIBLE_FOR--> R
CP --APPLIES_TO--> DS
CP --APPLIES_TO--> DE
CP --GOVERNED_BY--> R
DC --CLASSIFIED_AS--> DS
DE --BELONGS_TO--> DS
DE --GOVERNED_BY--> R
DE --MEASURES--> DQM
DE --VIOLATES--> DQI
DE --VIOLATES--> DQR
DE --VIOLATES--> DQM
DS --GOVERNED_BY--> R
DS --PRODUCES--> SR
DQM --VIOLATES--> DQI
DQM --VIOLATES--> DQR
DQI --VIOLATES--> DQR
DQR --APPLIES_TO--> DS
DQR --APPLIES_TO--> DE
LOB --CONTAINS--> BP
PS --RESPONSIBLE_FOR--> BP
classDef entity fill:#f9f9f9,stroke:#333,stroke-width:1px;
classDef relation fill:#EFEFEF,stroke:#666,stroke-width:1px;
end
FI Data Lineage Model
Fixed Income - Data Lineage Model Documentation
This data lineage model is designed to track the flow of data across systems, applications, and business processes in a financial services trading firm. The model includes nodes and relationships that cover data governance, regulations and compliance, data quality, and data security.
Core model concepts
Nodes
- Application: Represents systems or applications that are part of the data flow.
- BusinessAttribute: Represents attributes of a BusinessTerm (e.g., Customer ID, Customer Name, Email).
- BusinessKeyActivity: Represents a sub-step of a BusinessProcess.
- BusinessProcess: Represents activities or sets of activities that accomplish a specific organizational goal.
- BusinessTerm: Represents an entity or object that a BusinessProcess operates on (e.g., Customer).
- ComplianceOfficer: Represents an individual responsible for ensuring regulatory compliance and overseeing the firm’s adherence to data governance policies.
- CompliancePolicy: Represents the firm’s internal compliance policies and guidelines.
- DataAsset: Represents a data asset, such as a table, file, or database.
- DataClassification: Represents a logical grouping of types DataElements (e.g. email address, phone number, country code).
- DataConfidentiality: Represents data security classification levels (e.g., Confidential, Restricted, Public).
- DataElement: Represents the implementation of a BusinessAttribute in a DataSet (e.g., a field in a table).
- DataQualityIssue: Represents identified data quality problems or violations.
- DataQualityMetric: Represents quantifiable measures of data quality, such as completeness, accuracy, or timeliness.
- DataQualityRule: Represents rules that define data quality expectations and requirements.
- DataSet: Represents a logical aggregation of DataElements for common use.
- LineOfBusiness: Represents an organizational unit of the business.
- ProcessSteward: Represents the process equivalent of the data steward.
- Regulation: Represents specific regulatory requirements applicable to the financial services industry (e.g., GDPR, MiFID II, Dodd-Frank).
- SystemOfRecord: Represents the authoritative data source for a given data element or piece of information.
- TradeLifecycle: Represents the different stages that a trade goes through, from placement to settlement.
- Transformation: Represents a data transformation process or job.Represents a data transformation process or job.
Relationships
- APPLIES_TO: Connects a DataQualityRule to a DataElement or DataSet that it applies to.
- CLASSIFIED_AS: Connects a DataSet to a DataConfidentiality, indicating its security classification level.
- CONSUMED_BY: Connects a DataSet (source data) to an Application or BusinessProcess that consumes the data without creating or transforming it.
- CREATED_BY: Connects a DataSet (output data) to an Application or BusinessProcess that creates or transforms the data.
- GOVERNED_BY: Connects a DataElement, DataSet, or CompliancePolicy to a Regulation, indicating which regulatory requirements apply.
- IMPACTS: Connects Applications to show the downstream impact of changes.
- MAPS_TO: Connects a BusinessAttribute to a DataElement.
- MEASURES: Connects a DataQualityMetric to a DataElement or DataSet that it measures.
- PART_OF: Connects a BusinessAttribute to its corresponding BusinessTerm.
- PRODUCES: Connects a SystemOfRecord to a DataSet that it produces as output.
- RESPONSIBLE_FOR: Connects a ProcessSteward, ComplianceOfficer, or DataSteward to a BusinessProcess, Regulation, CompliancePolicy, or DataAsset, indicating their responsibility for ensuring compliance, data governance, or process oversight.
- UTILIZES: Connects a BusinessProcess to an Application that is used to accomplish the process.
- VIOLATES: Connects a DataQualityIssue to a DataElement, DataSet, DataQualityRule, or DataQualityMetric that it violates.
Model Diagrams
Full model diagram
The model that we have created is of a higher fidelity than the data that we currently have access to, but to get a sense of the scope of the model, we present a graph that contains the full model for contextualization.
Simplified model diagram
Here is the simplified version of the model that will be used as the version 1, Minimal Viable Product, of the model.
%%{
init: {
'theme': 'base',
'themeVariables': {
'background': '#000'
'primaryColor': '#BB2528',
'primaryTextColor': '#fff',
'primaryBorderColor': '#7C0000',
'lineColor': '#F8B229',
'secondaryColor': '#006100',
'tertiaryColor': '#fff'
}
}
}%%
%%| label: fig-simple-model
%%| fig-width: 7
%%| fig-cap: |
%%| The simplified model for data lineage that we will start with and improve as the qualtity of the data we have access to increases.
%%|
graph LR
subgraph "Simplified Model"
A[Application]:::entity
BA[BusinessAttribute]:::entity
BKA[BusinessKeyActivity]:::entity
BP[BusinessProcess]:::entity
BT[BusinessTerm]:::entity
DE[DataElement]:::entity
DS[DataSet]:::entity
LOB[LineOfBusiness]:::entity
PS[ProcessSteward]:::entity
TLC[TradeLifecycle]:::entity
A --IMPACTS--> A
DS --CONSUMED_BY--> A
DS --CREATED_BY--> A
A --UTILIZES--> BP
BA --MAPS_TO--> DE
BA --PART_OF--> BT
BKA --PART_OF--> BP
BP --CONSUMED_BY--> DS
BP --CREATED_BY--> DS
BP --HAS_STAGE--> TLC
DE --BELONGS_TO--> DS
LOB --CONTAINS--> BP
PS --RESPONSIBLE_FOR--> BP
classDef entity fill:#f9f9f9,stroke:#333,stroke-width:1px;
classDef relation fill:#EFEFEF,stroke:#666,stroke-width:1px;
end
end
## Node Definitions
/////////////////////////////////////////////////////////////////////////////
### Application
Represents a system or application that is used to process data.
#### **Properties**
- id
- name
#### **Relationships**
```{mermaid}
%%| label: fig-application-relationships
%%| fig-width: 7
%%| fig-cap: |
%%| Application Relationships
graph LR
subgraph "Application Relationships"
A[Application]:::entity
BP[BusinessProcess]:::entity
DS[DataSet]:::entity
A --IMPACTS--> A
BP --UTILIZES--> A
DS --CONSUMED_BY--> A
DS --CREATED_BY--> A
classDef entity fill:#f9f9f9,stroke:#333,stroke-width:1px;
classDef relation fill:#EFEFEF,stroke:#666,stroke-width:1px;
end
- CONSUMED_BY - Connects a DataAsset (source data) to an Application or BusinessProcess
- IMPACTS - Connects an Application to other related assets, such as other Application, DataAsset or DataElement
- UTILIZES - Connects a BusinessProcess to an Application
Example
/////////////////////////////////////////////////////////////////////////////
BusinessAttribute
Represents a property or characteristic of a BusinessTerm.
Properties
- id
- name
- description
Relationships
- HAS_ATTRIBUTE - Connects a BusinessTerm node to a BusinessAttribute
- MAPS_TO - Connects a BusinessAttribute node to a DataElement node
Example
(ba:BusinessAttribute {
id: : 'ba1',
name: 'Customer ID',
description: 'Unique id: for a customer'
})
/////////////////////////////////////////////////////////////////////////////
BusinessProcess
Represents a business process or set of activities that accomplish a specific organizational goal.
Properties
- id
- name
- description
Relationships
- CONSUMED_BY - Connects a DataAsset (source data) to an Application or BusinessProcess
- INVOLVES - Connects a BusinessProcess node to a DataAsset node or a Transformation node
- MANAGES_PROCESS - Connects a ProcessSteward node to a BusinessProcess node
- UTILIZES - Connects a BusinessProcess to an Application
Example
(bp:BusinessProcess {
id: 'bp1',
name: 'Customer Onboarding',
description: 'Process for onboarding new customers'
})
/////////////////////////////////////////////////////////////////////////////
BusinessTerm
Represents an entity or object that a BusinessProcess operates on.
Properties
- id
- name
- description
Relationships
- HAS_ATTRIBUTE - Connects a BusinessTerm node to a BusinessAttribute
Example
(bt:BusinessTerm {
id: : 'bt1',
name: 'Customer',
description: 'A customer entity in the organization'
})
/////////////////////////////////////////////////////////////////////////////
ComplianceOfficer
Represents an individual responsible for ensuring regulatory compliance and overseeing the firm’s adherence to data governance policies.
Properties
- id
- name
- title
Relationships
- RESPONSIBLE_FOR - Connects a ComplianceOfficer to a Regulation, CompliancePolicy, or DataAsset
Example
/////////////////////////////////////////////////////////////////////////////
CompliancePolicy
Represents the firm’s internal compliance policies and guidelines.
Properties
- id
- name
- description
Relationships
- GOVERNED_BY - Connects a DataAsset or DataElement to a Regulation or CompliancePolicy
- RESPONSIBLE_FOR - Connects a ComplianceOfficer to a Regulation, CompliancePolicy, or DataAsset
Example
(cp:CompliancePolicy {
id: : 'cp1',
name: 'Data Retention Policy',
description: 'Policy for data retention and disposal'
})
/////////////////////////////////////////////////////////////////////////////
DataAsset
Represents a data asset, such as a table, file, or database. A type of asset that represents details of organizational data in two layers. One layer is independent of any particular technology for non-technical stakeholder communication. The other one is taking the implementation system for technical stakeholder communication into account.
Properties
- id
- name
- description
- type
Relationships
- ACCOUNTABLE_FOR - Connects a SystemOfRecord node to a DataAsset node
- APPLIES - Connects a DataQualityRule to a DataAsset or DataElement
- CLASSIFIED_AS - Connects a DataAsset or DataElement to a DataClassification
- CONSUMED_BY - Connects a DataAsset (source data) to an Application or BusinessProcess
- CONTAINS - Connects a DataAsset node to a DataElement node.
- DERIVED_FROM - Connects a DataAsset (representing the view) to another DataAsset (representing the source data)
- GOVERNED_BY - Connects a DataAsset or DataElement to a Regulation or CompliancePolicy
- MEASURES - Connects a DataQualityMetric to a DataAsset or DataElement
- RESPONSIBLE_FOR - Connects a DataSteward node to a DataAsset node
- RESPONSIBLE_FOR - Connects a ComplianceOfficer to a Regulation, CompliancePolicy, or DataAsset
- VIOLATES - Connects a DataQualityIssue to a DataQualityRule, DataQualityMetric, DataAsset, or DataElement
Example
/////////////////////////////////////////////////////////////////////////////
DataClassification
Represents data security classification levels (e.g., Confidential, Restricted, Public).
Properties
- id
- name
- description
Relationships
- CLASSIFIED_AS - Connects a DataAsset or DataElement to a DataClassification
Example
(dc:DataClassification {
id: : 'dc1',
name: 'Confidential',
description: 'Highly sensitive data requiring strict access control'
})
/////////////////////////////////////////////////////////////////////////////
DataElement
Represents a field (column) within a data asset.
Properties
- id
- name
- description
- dataType
Relationships
- APPLIES - Connects a DataQualityRule to a DataAsset or DataElement
- CLASSIFIED_AS - Connects a DataAsset or DataElement to a DataClassification
- CONTAINS - Connects a DataAsset node to a DataElement node.
- GENERATED_BY- Connects a DataElement node to a Transformation node
- GOVERNED_BY - Connects a DataAsset or DataElement to a Regulation or CompliancePolicy
- IMPACTS - Connects an Application to other related assets, such as other Application, DataAsset or DataElement
- MAPS_TO - Connects a BusinessAttribute node to a DataElement node
- MEASURES - Connects a DataQualityMetric to a DataAsset or DataElement
- TRANSFORMS_TO - Connects a [DataElement node to another DataElement node
- VIOLATES - Connects a DataQualityIssue to a DataQualityRule, DataQualityMetric, DataAsset, or DataElement
Example
(de:DataElement {
identifier: 'column1',
name: 'CustomerID',
description: 'Unique customer identifier',
dataType: 'integer’
})/////////////////////////////////////////////////////////////////////////////
DataQualityIssue
Represents identified data quality problems or violations.
Properties
- id
- description
- severity
Relationships
- VIOLATES - Connects a DataQualityIssue to a DataQualityRule, DataQualityMetric, DataAsset, or DataElement
Example
/////////////////////////////////////////////////////////////////////////////
DataQualityMetric
Represents quantifiable measures of data quality, such as completeness, accuracy, or timeliness.
Properties
- id
- name
- description
Relationships
- MEASURES - Connects a DataQualityMetric to a DataAsset or DataElement
- VIOLATES - Connects a DataQualityIssue to a DataQualityRule, DataQualityMetric, DataAsset, or DataElement
Example
/////////////////////////////////////////////////////////////////////////////
DataQualityRule
Represents rules that define data quality expectations and requirements.
Properties
- id
- name
- description
Relationships
- APPLIES - Connects a DataQualityRule to a DataAsset or DataElement
- VIOLATES - Connects a DataQualityIssue to a DataQualityRule, DataQualityMetric, DataAsset, or DataElement
Example
(dqr:DataQualityRule {
id: : 'dqr1',
name: 'Completeness Rule',
description: 'Rule to ensure data completeness'
})
/////////////////////////////////////////////////////////////////////////////
DataSet
A collection of related sets of data assets that are data elements or composed of data elements.
Properties
- id
- name
- description
Relationships
- ACCOUNTABLE_FOR - Connects a SystemOfRecord node to a DataAsset node
- AGGREGATES - Connects a DataSet node to a DataElement
- APPLIES - Connects a DataQualityRule to a DataAsset or DataElement
- IMPACTS - Connects an Application to other related assets, such as other Application, DataAsset or DataElement
- INVOLVES - Connects a BusinessProcess node to a DataAsset node or a Transformation node
Example
(ds:DataSet {
id: : 'ds1',
name: 'CustomerDataSet',
description: 'A dataset containing customer information’
})
/////////////////////////////////////////////////////////////////////////////
DataSteward
Represents a person responsible for managing data assets and ensuring data quality and governance.
Properties
- id
- name
Relationships
- RESPONSIBLE_FOR - Connects a DataSteward node to a DataAsset node
Example
/////////////////////////////////////////////////////////////////////////////
ProcessSteward
Represents a person responsible for managing and ensuring the quality and governance of a business process.
Properties
- id
- name
Relationships
- ACCOUNTABLE_FOR - ProcessSteward node to a SystemOfRecord node
- MANAGES_PROCESS - Connects a ProcessSteward node to a BusinessProcess node
Example
/////////////////////////////////////////////////////////////////////////////
Regulation
Represents specific regulatory requirements applicable to the financial services industry (e.g., GDPR, MiFID II, Dodd-Frank).
Properties
- id
- name
- description
Relationships
- GOVERNED_BY - Connects a DataAsset or DataElement to a Regulation or CompliancePolicy
- RESPONSIBLE_FOR - Connects a ComplianceOfficer to a Regulation, CompliancePolicy, or DataAsset
Example
/////////////////////////////////////////////////////////////////////////////
SystemOfRecord
Represents the authoritative data source for a given data element or piece of information.
Properties
- id
- name
- description
Relationships
- ACCOUNTABLE_FOR - Connects a SystemOfRecord node to a DataAsset node
Example
(sor:SystemOfRecord {
id: 'sor1',
name: 'Customer Master Data',
description: 'Authoritative source for customer information'
})
/////////////////////////////////////////////////////////////////////////////
Transformation
Represents a data transformation process or job.
Properties
- id
- name
- description
- type
Relationships
- GENERATED_BY- Connects a DataElement node to a Transformation node
- INVOLVES - Connects a BusinessProcess node to a DataAsset node or a Transformation node
Example
Relationships
/////////////////////////////////////////////////////////////////////////////
ACCOUNTABLE_FOR:
Connects a ProcessSteward node to a SystemOfRecord node, indicating that the person is accountable for the authoritative data source.
Example
/////////////////////////////////////////////////////////////////////////////
AGGREGATES:
Connects a DataSet node to a DataElement or node, representing the aggregation of DataElements for common use.
Example
/////////////////////////////////////////////////////////////////////////////
APPLIES
Connects a DataQualityRule to a DataAsset or DataElement, indicating that the rule applies to the specific data.
Example
/////////////////////////////////////////////////////////////////////////////
CLASSIFIED_AS
Connects a DataAsset or DataElement to a DataClassification, indicating the security classification level of the specific data.
Example
/////////////////////////////////////////////////////////////////////////////
CONSUMED_BY
Connects a DataAsset (source data) to an Application or BusinessProcess that consumes the data without creating or transforming it.
Example
(d:DataAsset {type: 'Table'})-[:CONSUMED_BY]->(a:Application)
or
#### **Example**
``` cypher
(d:DataAsset {type: 'Table'})-[:CONSUMED_BY]->(bp:BusinessProcess)
/////////////////////////////////////////////////////////////////////////////
CONTAINS
Connects a DataAsset node to a DataElement node.
Example
/////////////////////////////////////////////////////////////////////////////
DERIVED_FROM
Connects a DataAsset (representing the view) to another DataAsset (representing the source data), indicating that the view is derived from the source data.
Example
/////////////////////////////////////////////////////////////////////////////
GENERATED_BY
Connects a DataElement node to a Transformation node, indicating that the field was generated by a particular transformation process.
Example
/////////////////////////////////////////////////////////////////////////////
GOVERNED_BY
Connects a DataAsset or DataElement to a Regulation or CompliancePolicy, indicating that the data is subject to specific regulatory requirements or policies.
Example
/////////////////////////////////////////////////////////////////////////////
HAS_ATTRIBUTE
Connects a BusinessTerm node to a BusinessAttribute node.
Example
/////////////////////////////////////////////////////////////////////////////
IMPACTS
Connects an Application to other related assets, such as other Application, DataAsset or DataElement, to represent the influence or effect the Application has on these assets, particularly when changes occur within the Application .
Example
(a:Application)-[:IMPACTS]->(d:DataAsset {type: 'Table'})
or
#### **Example**
``` cypher
(a:Application)-[:IMPACTS]->(de:DataElement)
or
#### **Example**
``` cypher
(a1:Application)-[:IMPACTS]->(a2:Application)
/////////////////////////////////////////////////////////////////////////////
INVOLVES
Connects a BusinessProcess node to a DataAsset node or a Transformation node, indicating that the process involves the use or modification of the data asset or transformation.
Example
(bp:BusinessProcess)-[:INVOLVES]->(d:DataAsset)
#### **Example**
``` cypher
(bp:BusinessProcess)-[:INVOLVES]->(t:Transformation)
/////////////////////////////////////////////////////////////////////////////
MANAGES_PROCESS
Connects a ProcessSteward node to a BusinessProcess node, indicating that the person is responsible for managing the business process.
Example
/////////////////////////////////////////////////////////////////////////////
MAPS_TO
Connects a BusinessAttribute node to a DataElement node.
Example
/////////////////////////////////////////////////////////////////////////////
MEASURES
Connects a DataQualityMetric to a DataAsset or DataElement, indicating that the metric is used to measure the quality of the specific data.
Example
/////////////////////////////////////////////////////////////////////////////
RESPONSIBLE_FOR
Connects a ComplianceOfficer to a Regulation, CompliancePolicy, or DataAsset, indicating their responsibility for ensuring compliance.
Example
/////////////////////////////////////////////////////////////////////////////
TRANSFORMS_TO
Connects a [DataElement node to another DataElement node, representing the transformation from one field to another in a data transformation process.
Example
/////////////////////////////////////////////////////////////////////////////
UTILIZES
Connects a BusinessProcess to an Application that is used to accomplish the process.
Example
/////////////////////////////////////////////////////////////////////////////
VIOLATES
Connects a DataQualityIssue to a DataQualityRule, DataQualityMetric, DataAsset, or DataElement, indicating that the issue represents a violation of the rule or metric or is related to the specific data.
Example
/////////////////////////////////////////////////////////////////////////////